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Abstract 

The cross-entropy method (CE) developed by R. Rubinstein is an elegant practical 
principle for simulating rare events. The method approximates the probability of the 
rare event by means of a family of probabilistic models. The method has been extended 
to optimization, by considering an optimal event as a rare event. CE works rather 
good when dealing with deterministic function optimization. Now, it appears that two 
conditions are needed for a good convergence of the method. First, it is necessary to have 
a family of models sufficiently flexible for discriminating the optimal events. Indirectly, 
it appears also that the function to be optimized should be deterministic. The purpose 
of this paper is to consider the case of partially discriminating model family, and of 
stochastic functions. It will be shown on simple examples that the CE could fail when 
relaxing these hypotheses. Alternative improvements of the CE method are investigated 
and compared on random examples in order to handle this issue. 

1 Introduction 

The Cross-Entropy method has been developed by R. Rubinstein for the simulation of rare 
events The algorithm iteratively builds a near-optimal importance sampling of the rare 
event, based on a family of parameterized sampling laws. The construction of the importance 
sampling is obtained by iteratively: 

• tossing samples, 

• selecting the samples which are approximating the rare events, 

• relearning the parameters of the sampling law by minimizing its Kulback-Leiber dis- 
tance (cross-entropy) with the selection, 

• computing the importance weightings. 

By considering the optimal events related to an objective as rare events, the method has 
been extended to optimization problems. 

The cross-entropy method has been implemented successfully on many combinatorial prob- 
lems. However, attempted proofs of the method make some assumptions as preliminary 
requests^ 0]. First, the proof has been made in a deterministic context. Secondly, the 
closure of the simulation law family should contain the dirac on the optimum (or laws with 
support on the optimums). 
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The first condition cannot be fulfilled properly, in case of stochastic problem. The second 
condition is an obvious requirement. But there are some cases, where it is not possible 
to handle all the solutions precisely by the law family. Indeed, the solutions may not be 
countable practically; this is typically the case for some dynamic problems (for example, the 
strategy tree against a deterministic computer chess player). Both difficulties are encoun- 
tered in optimal planning with partial observation. The purpose of this paper is to point out 
on simple examples, that these hypotheses are necessary for the convergence of the classical 
CE method. The questions are: 

• Does the classical CE algorithm solve stochastic problems properly? It appears that 
the quantile selection within the CE may not work properly, without a rather good 
estimation of the objective functional expectation. Nevertheless, smoother selection 
criteria seem to be a possible answer to these difficulties. 

• Assume that the law family closure does not contain all the deterministic solutions. The 
CE algorithm will converge to a stochastic approximation of the optimal solution. Is 
this approximation the best possible within the law family? Our answer to this question 
is not absolutely negative. But it appears that some extensions of the CE, quite usually 
implemented, will fail on this question. 

This paper presents some counterexamples to these questions. In the case of stochastic op- 
timization, tests are done on simple random examples in order to compare the convergence 
of various CE methods with the global optimum. 

Next section introduces shortly the principle of the CE method. Section O will consider the 
case, where the optimal solution is not caught properly by the sampling family. A coun- 
terexample is proposed and studied. In section 0] stochastic problems are considered. Two 
simple counterexamples are investigated, thus enlightening some typical convergence difficul- 
ties. Different evolutions of the cross-entropy are then compared to the basical method, by 
generating several random examples. In particular, a method with smooth sample selection 
is proposed as a possible alternative for the stochastic problems. Section El concludes. 

2 Basis of the cross-entropy method 

The reader interested in CE methods should refer to the tutorial and the book ^ on 
the CE method. CE algorithms were first dedicated to estimating the probability of rare 
events. A slight change of the basic algorithm made it also good for optimization. We will 
not focus on the cross-entropy method for simulation, although this primary aspect of the 
method is quite interesting. Rather, the CE method for optimization is now presented and 
discussed. While there are different evolutions of the primary method related to the choice 
of the selective rate or to a smooth update, this presentation is restricted to the basical 
CE method. By the way, it is not difficult to attest that the counterexamples proposed in 
sections 13 and 0] still work with these evolutions. 

2.1 General CE algorithm for the optimization 

The Cross Entropy algorithm repeats until convergence the three successive phases in order 
to maximize a given reward criterion: 

1. Generate samples of random data according to a parameterized random mechanism, 

2. Select the best samples according to the reward criterion, 

3. Update the parameters of the random mechanism, on the basis of the selected samples. 

In the particular case of CE, the update in phase 3 is obtained by minimizing the KuUback- 
Leibler distance, or cross entropy, between the updated random mechanism and the selected 
samples. The next paragraphs describe on a theoretical example how such method can be 
used in an optimization problem. 
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Formalism. Let be given a function x i— > /(a;); this function is easily computable. The 
value f{x) has to be maximized, by optimizing the choice oi x & X. The function / will be 
the reward criterion. 

Now let be given a family of probabilistic laws, Po-lo-es , applying on the variable x. The 
family P is the parameterized random mechanism. 

Let p G ]0, 1[ be a selective rate. The CE algorithm for {x, f, P) follows the synopsis : 

1 . Initialize cr G S , 

2. Generate N samples Xn according to P^ , 

3. Select the pN best samples according to the reward criterion / , 

4. Update ct as a minimizer of the cross-entropy with the selected samples: 

a e argmax InP^la^n) , 

n selected 

5. Repeat from step|21until convergence. 

This algorithm requires f to be easily computable and the sampling of P^ to be fast. 

Interpretation. The CE algorithm tightens the law P^ around the maximizer of /. Then, 
when the probabilistic family P is well suited to the maximization of / , it becomes equivalent 
to find a maximizer for / or to optimize the parameter a by means of the CE algorithm. The 
problem is to find a good family, and convergence parameters. 

Extensions. 

Smooth update. The method has been extended by implementing a smooth update 
of the law. More precisely, assume the set {Pa/o' € S} to be convex, and let a € [0, 1[ be a 
smoothing rate. The algorithm follows the synopsis : 

1 . Initialize cr e S , 

2. Generate TV samples Xn according to P^ , 

3. Select the pN best samples according to the reward criterion / , 

4. Define (Ji as a minimizer of the cross-entropy with the selected samples: 

(Ti e argmax lnPcr^{xn) , 

n selected 

5. Define a2 such that P^^ = aP^ -f (1 — a)P^-^ , and update a by setting a := a2 , 

6. Repeat from step |2 until convergence. 

Adaptive parameters. The principle is to make the parameters a and p dependent of 
the iteration time of the algorithm or on other contextual informations. Adaptive parameters 
appears as a main ingredient in the different proofs of convergence of the method. 
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Sampling with rejection. In some examples (particularly the salesman) considered 
in the CE tutorial j2j, the laws family Po-lo-es does not match the set X of valid values for the 
variable x. More precisely, there is a set Y D X such that G ViY), i.e. is defined as a 
probability over Y . The implementation of such a law family in the CE methods is possible 
by rejecting the invalid samples generated by P\- A slight change is implied in the step 2 of 
the CE algorithm: 

2. Repeat the subsequent process for any n G {1, . . . , N} : 

(a) Generate a sample a; G F according to P\ , 

(b) li X ^ X, then repeat from step (a) , 

(c) At this step, x G X . Then, set Xn ~ x . 

There is no other change implied to the algorithm. In particular, the update step is the 
same: the update of Px is done from the selected values of the subset X. 

At first sight, this update of the law is questionable in regards to the rejection. Indeed, the 
law to be learned from the samples is P\/ X^xgx ^^i-'^) ^^'^ ^'^^ ■ This induces a different 
result while minimizing the cross-entropy with the selected samples. 

However, the rejection could also be derived from a parameter adaptation: the idea is to 
interpret the invalid samples ofY\X as samples with very bad reward. Then, the classical 
CE scheme is recovered by adapting the number of samples N and the parameter of selection 
p in order to reject these invalid samples. 

This last interpretation makes sense, when the process actually converges to a law with a 
support included in X. This is the case, for example, when the law converges to a dirac 
around the optimum. But otherwise, it will be shown in section |3| that the convergence may 
be biased. 

Convergence. Different convergence results have been proposed for the method and its 
extensions ^ ^ El • The convergence needs a proper tuning of the parameters of the algo- 
rithm (selecting rate, smoothing, number of samples). Essentially, these results have been 
established for the optimization of deterministic functions. Another issue is the stability of 
the optimization process, when the family of law, Pa\ae's , docs not necessarily match the 
optimal value properly. The questions investigated by this paper are: 

• Does the classical CE algorithm solve stochastic problems properly? A negative answer 
is given subsequently. An evolution of the CE is proposed in order to deal with this 
problem. 

• Assume that the law family closure does not contain the dirac, or dirac mixture, around 
the optimal solutions. Does the CE process provide the best approximation possible 
within the law family? A partial negative answer is provided in next section, by pro- 
ducing a counterexample based on a sampling law with reject. This counterexample 
does not work in the classical scheme of the CE. It is not clearly answered in this pa- 
per, what should be the conditions in the CE process for guaranteeing such stability 
of the convergence. But it is sure that one have to be more careful in the choice and 
manipulation of the family. 

3 When the family of laws does not enclose the optimum 

The subsequent example is inspired from a convergence fiaw diagnosed within a practical 
trajectory planning experiment; an experiment achieved by Francis Celeste 0, which is 
working in our team. 
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Problem setting. It is assumed that an agent has two possible actions: the action continue 
or the action end. Each time the agent decides to continue, it receives the reward +1 and 
the process is continued. When the agent decides to end, it still receives the reward +1 but 
the process is terminated. Thus, the agent has to choose a sequence of action, which is a 
repetition of the action continue terminated by the action end: 

continue; continue . . . continue; end . 

The reward for a whole sequence of action is t, the length of the sequence. Now. a constraint 
of length is imposed to the actions. The sequence of action cannot contain more than T 
actions, so that t <T. 

Optimal solution. The optimal solution is obvious. The agent will do as many action as 
possible. Its optimal sequence of action is thus: 

continue; • • ■ ; continue; end . 
V ' 

Tx 

The problem is actually a triviality. But we will see that for some laws family, the CE with 
rejection will fail in finding the optimal law. 

Proposal of a laws family, and convergence issue. On such a simple example, the 
best choice is perhaps a law on the length of the process sequence. But in fact, this kind of 
problem could be easily generalized so as to involve more than two possible actions (not only 
continue or end). Then, a Markov chain is generally used for these problems. In the salesman 
problem, for example, the actions are the choice for a town; the salesman is solved in 012] 
by means of a Markov chain with reject. A method with reject is investigated subsequently. 

The purpose is to sample a sequence {de\l < <t) where 1 <t <T, de = continue for 9 < t, 
and dt — end . This sampling will be done by means of a reject method: 

• Generate a sample without size constraint: (dejl < ^ < i) where 1 < t, dg = continue 
for 6 < t, and dt — end , 

• Reject the sample when t > T . 

Sampling a sequence without size constraint. The sampling will be generated uniformly and 
independently for each step, so that the sampling law of the sequence is characterized by the 
law p\ for sampling a single action: 

P\{de ~ continue) = A and pxide ~ end) = 1 — A . 

The whole process takes into account the ending state, so that the sample generation follows 
the following synopsis: 

1. Set t = 0, 

2. Set < := < + 1 , 

3. Generate dt by means of the law px , 

4. Repeat from step El until dt ~ end. 

As a consequence, the probability of a full sequence d = {de\l < 6 < t) is given by: 

PA(rf)-A*-i(l-A) . 

Optimal law. The optimal law is the one which yields the best gain expectation for the valid 
trajectories generated by Px . The gain expectation after rejection is given by: 

p . ELi^a*-^(i-a) _ eLi^a*-^ 
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This expectation is maximized when A = 1 : 



Within the family, the optimal law is Pi . 

Notice that this optimal distribution is not an optimum for the problem. The family P\|0 < 
A < 1 is not sufficiently rich to handle the optimum of the function. 

It is sometimes not possible to provide a family able to handle the global optimum of the 
function. Then, it is often sufficient to find the optimal distribution among the family. Is the 
CE able to provide such optimal distribution among the family? Subsequently, it is shown 
on the example that the CE (with rejection) does not converge to the optimal law Pi. 

Updating the law. Assume M = pN samples (c?"|l < n < M) being obtained after a 
sampling process (with reject) and a selection of the best samples. Denote tn the ending 
time of sequence d" [beware: it is not a power operation!). 

The parameter A for the upcoming loop of the CE algorithm is obtained by maximizing the 
distance with the selected samples: 



1 

A G arg max — ^ In P\ (d" ) 



Now: 

M , M 



i lnP.(d") ^ i- 5] \n{X^^-\l - A)) = i- 5: t„ - 1 In A + ln(l - A) 



n— 1 n— 1 

The maximization then results to the relation: 




\ \ Af ^ " / \ 1 - A 

\ \ n=l / / 

At last, the following update relation is derived: 




A = 1 - 7\/ / V t„ . (1) 



Convergence issue. Equation ^ and the rejection constraint imply that A < 1 — y after 
update. As a consequence, the CE does not converge to Pi, the optimal distribution among 
the family. In fact, it is even proved by considering the CE process that A < 1 — ^. Let Pa* 
be the law obtained after convergence of the CE. Then: 

ELii(l-l/r)*-^ 



Let us consider the simple case T = 2, and compare the expectations: 

i^Pi(.|t<T)t= (1 + 2)7(1 + 1) = ^ and iJp.(.|,<j,)t<(l + 2xi)/(l + i) = ^. 
The difference, at least 11%, is not negligible. 

Convergence in the CE classical scheme. As it has been discussed in section 12 the 
update of A within the classical scheme will be obtained by minimizing the cross-entropy of 
the conditional law: 

P*id\t <T)- ^*"(^-^) - 
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with the selected samples. Thus, the update is expressed by: 

1 A*"-i(l-A) 
A G argmax 2^ai j-^^r ■ 

Defining t = jj X!n^i the optimization reduces to: 

A*-i(l-A) 
A G arg max — — . 

The maximum of this function is not necessarily located at A = 1 . For example, when t = 1, 
the optimum is obtained for A = . Now, the function to be optimized could be rewritten: 

Then, it is deduced: 

t>T => 1 G arg max , \^ ■ (2) 

0<A<1 1 - A^ 

The equation (j^J has a clear interpretation: when A > at initialization and the selective 
rate p is sufficiently small, then the CE algorithm (without rejection) converge to the optimal 
law Pi . As a conclusion, our counterexample fails in the classical CE paradigm. 

Discussion. The previous example has shown convergence issue of the CE with reject when 
the laws family cannot reach the optimum of the function. This counterexample does not 
work when using a classical CE scheme. In general, even when the family cannot handle 
the optimum exactly, the convergence still works rather well in the classical CE paradigm. 
Many questions arise however. In particular, how to evaluate and enhance the stability of 
the convergence in regards to the discrimination weakness of the laws family? 

4 When the problem is stochastic 

In this section, it is discussed about the convergence of the CE in case of stochastic optimiza- 
tion. Notice that it is still possible to bring such stochastic problems back to deterministic 
problems by computing the expectation of the objective function. But generally, this com- 
putation is obtained by simulation and is costy. A reduction of the cost could be obtained 
by means of the method described in section 14.2.11 

When the variable to be optimized and the stochastic variable of the system are dependent, 
the expectation will make necessary the use of a functional abstraction of the variable to be 
optimized (instead of conditional laws). This is again somewhat costy. Moreover, the cost 
reduction method described in section 14.2.11 is no more feasible (when the variable of the 
system depends on the variable to be optimized). 

The purpose of this section is to consider the stochastic optimization by the CE without 
computing the expectation of the objective. It is shown on simple examples that there may 
be a true convergence difficulty of the CE method in such conditions. 

In the first subsequent example, the value to be optimized is conditioned by another vari- 
able which is stochastic. In other word, the value to be optimized could be considered as 
a function of the stochastic variable. Such problems do not appear classically in the CE 
literacy, but explain clearly some typical difficulties in the convergence. The second example 
is unconditioned and more classical. These examples will be completed by a study of stochas- 
tic optimization problems (here, without conditioning), which will be generated randomly. 
Alternative solutions to the classical CE are proposed and compared then. 
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4.1 Examples 
4.1.1 Example 1 

Typically, there is an additional difficulty in evaluating the expectation of the objective 
function, when the variable to be optimized are conditioned by the variable of the system. 
For this reason, we will start by considering this kind of example. 

Let us consider the following stochastic problem: 



fo e arg ma.x 'S2p{x)V{f{x),x) , 

X 

where x G {0, 1} , d £ {0, 1} , p(0) = = \ , V{d, x) = 2x + d , 
and / is a mapping from x to d . 

This problem could be seen from a probabilistic viewpoint: 
ho £ argmaxy~^p(x)fe(d|a:)F((i, x) , 

where a; £ {0, 1} , d £ {0, 1} , p(0) = p(l) = \ , V{d, x) = 2x + d , 
and h{d\x) is a probability of d conditionally to x . 



(3) 



(4) 



We will apply a cross-entropic method in order to solve the optimization Notice that 
the method will differ slightly from usually, since we are handling a conditional laws family. 

Direct solve. The obvious answer to this problem is h{0\x) = and h{l\x) = 1; the 
optimal gain is 2. 

Cross-entropic solve. A cross-entropic procedure is proposed here with quantile selection 
p = 10% (no smooth update, for simplicity) in order to solve Q): 

• Initialize h by h{0\0) = h{l\0) = /i(0|l) h{l\l) = i, 

• Make 100 samples and evaluate them by V, 

• [*] Select the 10% best samples, update h{-\x) from the selected samples, when it is 
possible.^ Reiterate from previous step. 

Since V{di,l) > V{d2, 0) for any choice of di, it comes that samples {d, 0) are (almost) nevcr^ 
selected. As a consequence, h{-\0) is (almost) never updated and stays unchanged. Thus, a 
practical convergence will stale to the solution ft,(0|0) = ^(1|0) = ^ ! ''■(0|1) = 0; li 
which is sub-optimal. The expected gain is then |. 

This example contains a specific difficulty: we are indeed optimizing the function x ^ f{x) 
by mean of a conditional law. By the way, one may argue that [*] is not a good updating 
strategy, since the samples should be selected relatively to each condition x. But this is not 
possible, when there are many possible conditions x (this is often the case). 

4.1.2 Example 2 

It could be argued about the previous example that the use of a conditional family is not the 
classical scheme for applying the CE method. This forthcoming example will be related to a 
more classical scheme. 



^Leave h(-\x) unchanged when there are no selected samples conditioned by x. 
^Probability is around 10^^* 
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Now, let us solve the following stochastic problem: 



do G argmax^ ^p{x)V{d,x) , 



X 



where a; £ {0, 1} , d e {0, 1} , p(0) = p{l) = i , 
and 1/(0, 0) = 2 , y(0, 1) = -2 , ^(1, 0) = V{1, 1) = 1 . 



(5) 



From a CE viewpoint, the problem becomes: 



ho e argmax N p{x)h{d)V{d,x) , 



a: 



where a; e {0, 1} , de {0, 1} , p{0) 

F(o,o) = 2, y(o,i) = -2, y(i,o) 

and is a probability of d . 



--V{1,1) = 1. 



(6) 



Direct solve. The optimal solution of © is of course ho{0) = and /io(l) = 1 , resulting 
in the gain 1. 

Cross-entropic solve. A cross-entropic procedure is proposed here with quantile selection 
p = 10% (no smooth update, for simplicity) in order to solve ©: 

• Initialize h by h{0) = h{l) = i. 

• Make 100 samples and evaluate them by V, 

• Select the 10% best samples, update h from the selected samples. Reiterate from 
previous step. 

Since F(0,0) > V{d,x) for any {d,x) ^ (0,0), it comes that the samples (d, x) ^ (0,0) 
are (almost) never selected. As a consequence, the selected samples will be (0, 0) from the 
beginning of the CE process. Consequently, the CE process will converge to the sub-optimal 
solution /i*(0) = 1 and /i*(l) = , thus resulting in the gain 0. 

4.1.3 Discussion. 

The two previous examples are enlightening. It appears clearly that the selection scheme of 
the CE (selection of a quantile) does not work properly, in regards to a stochastic objective. 
Indeed, some configurations of the problem, which are sampled by the law of the system but 
not by us, will be automatically discarded by the quantile selection process. By discarding 
these cases, a convergence bias is generated. 

4.2 Alternative methods 

4.2.1 Computing the expectation (reduced cost) 

This method is not exactly an alternative: it is costy. But it will be provided as a reference 
for the test comparison. The idea is to replace the stochastic objective function V(d^ x) by 
an estimation of its expectation. This expectation is obtained by sampling over x according 
to the law p of the system. More samples are used, more accurate is the estimation. Here, 
we are using the same samples of x for computing the expected gain of the samples dn- This 
will reduce greatly the complexity. But such method is not feasible, when the variables x 
and d are dependent. The whole algorithm is explained subsequently: 

1. Initialize h , 
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2. Generate N samples dn according to h , 

3. Generate K samples Xk according to p, 

4. Evaluate each sample dn by the estimated expectation u„ — ^^=1 ^{d-n,Xk) , 

5. Select the pN best samples dn according to the expectation Vn , 

6. Update /i as a minimizcr of the cross-entropy with the selected samples: 

h e argmax ln/i((i„) , 

n selected 

7. Repeat from step |21 until convergence. 

4.2.2 Using another selection scheme for the CE 

The idea here is to change the selection scheme of the CE. The stochastic objective function 
V{d,x) is directly used here. As in section 14.1.21 the stochastic pair {d,x) is sampled and 
evaluated at the same time. 

Selection scheme. Assume N samples ((i„,x„) being evaluated by w„ = V{dn,Xn) ■ It is 
defined a non decreasing function R, which will characterize the importance R{vn) of each 
sample {dn, The update of h will be computed as a maximizer of the cross entropy with 



the discrete weighted distribution I dn, ^ n Z, ^ 

Algorithm. The whole algorithm is explained subsequently: 

1 . Initialize h , 

2. Generate N samples d„ according to h and N samples Xn according to p , 

3. Evaluate each sample pair (d„, by w„ = Y(dm Xn) , 

4. Update as a minimizer of the cross-entropy with the weighted samples: 

N 



h € argmax R{vn) In h{dn) , 



71=1 

5. Repeat from step El until convergence. 

This selection scheme is called smooth selection scheme. Notice that the quantile selection 
of Rubinstein is a particular case of the smooth selection scheme, where the function i? is a 
heavysidc function pointed on the quantile. 

4.3 Method comparison by means of Randomly generated tests 

The three methods, basic CE; CE with expectation computation; and smooth selection 
scheme, have been compared on random problems. The method for creating the problems is 
simple: 

• There are 100 possibles states for d and for x, that is cZ, a; G {1, . . . , 100} , 

• The parameters V{d,x) g]0, 1] are generated randomly, according to the uniform law, 
for any d and any x, 

• The probability p is generated randomly, according to the uniform law (that is the 
99-dimensions vector characterizing p is generated uniformly) , 
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Notice that it is quite easy to solve these problems, by enumerating the cases. 
The test has been executed 1000 times. The parameters of the algorithm are: 

• K ^ N = 100 and p = 10%, 

• The update is smoothed by a = 0.9 {i.e. the innovation is 10%), 

• The importance function R is defined by R{vn) = . 

The following table gives the percentage of the optimum achieved by each method. These 
results are averaged over the 1000 executed tests, and the variance is given. 



Optimal percentage 


Mean 


Variance 


Basic CE 


93.9% 


3.7% 


Expectation 


99% 


0.4% 


Smooth scheme 


99.1% 


0.7% 



The convergence speed of the expectation CE and the smooth selection CE was compara- 
ble. Since the expectation is computed with reduced cost, the methods run with similar 
computation cost. 

5 Conclusion 

This paper has investigated the convergence issues of the cross-entropy method when relaxing 
the constraints of use. A counterexample has been found for the CE with reject, when the laws 
family used for the CE is too weak and does not contain the optimum dirac. Counterexamples 
have been found when optimizing a stochastic objective function. Weakness of the family 
and stochastic objective are very important context of use of the CE algorithm. By the way, 
both difficulties are encountered when optimizing a control with partial observation An 
alternative evolution of the CE has been proposed for the stochastic optimization. It is based 
on a smooth scheme for the sample selection. The convergence of weak laws family is still 
an unsolved question. Next works will focus on this difficult problem. Moreover, the proof 
of convergence of the smooth selection scheme will be investigated; at this time, this method 
has been evaluated only by experimental means. 
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